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DETAILED ACTION 

Response to Amendment 

1 . In response to the office action from 8/6/2009, the applicant has submitted an 
amendment, filed 9/25/2009, amending the independent claims 1, 2 and 8, and the 
dependent claims 3-5, 7, 9-10 and 12, while arguing to traverse the prior art rejection. 
Applicant's arguments have been fully considered, however the previous rejection is 
maintained due to the reasons listed below in the response to arguments. 

Response to Arguments 

2. On page 10 the last ^ and page 11 the first U, the applicant has contended that 
the primary reference used in the first action (Asano US 2004/0054531) did not qualify 
as prior art because its WIPO document indicated that its PCT filing date (10/21/2002) 
corresponded to a Japanese PCT and was not published in English as required at the 
said PCT filing date, and therefore could not be used against the foreign priority date of 
the applicant (1 1/12/2003). The applicant used a similar reasoning for other claims on 
pages 13 (4th ^ line 3 with regard to claims 2-3, 8-12), and on page 17 (5th U line 3 with 
regard to claim 5). 

However, according to the patented version of the publication (US 7,031,917), 
this application possesses a 371 (c) filing date of 6/19/2003. According to 37CFR 1 .9, a 
national stage application from an international application with 35 U.S.C 371 (e.g, the 
371 (c ) above) is treated as a national application and the 371 (c ) filing date is 
accepted as a US national filing date. As the said filing date (6/19/2003) qualifies this 
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application as a 102(e) reference, therefore all those arguments are not persuasive and 
Asano is a valid prior art reference. 

On page 1 1 the last If, and on page 12, the 1 st If, the applicant has simply 
rewritten the amended versions of the first claim. On page 1 2 If's 2-4 the applicant has 
simply made some remarks regarding applicant's understanding of the teachings of 
Asano. Finally on page 12 the last If and page 13 the first If, the applicant has 
contended that because "the acoustic models in Asano are produced based on the 
distance between the microphone and the sound source and not adjusted to the sound 
direction . . . .", (page 1 3 first If lines 2-4), therefore it is patently distinguishable over 
Asano. 

The examiner respectfully points to page 3 of the first action, 3 lines above the 
bottom where the examiner had clarified that point by replacing "direction-dependent 
acoustic models" phrase in the last limitation of claim 1 with "distance-dependent 
acoustic models" and If 01 14 of Asano was thereby used for that specific teaching. 
However as pointed out at the bottom of page 4 of the first action, Asano in If 01 29 
(whose teachings were not discussed by the applicant!) also teaches determining the 
direction of a sound source, although it does not specifically teach storing direction 
dependent acoustic models in localizing sound sources. For those reasons a 35 U.S.C 
103 (a) rather than a 102(e) rejection was used, as Asano teaches obtaining direction of 
a sound source, as well as teaching storing distance dependent acoustic models, 
rendering it obvious to try to store direction dependent acoustic models for the 
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motivation that was described in that claim. Therefore those arguments are not 
persuasive. 

On page 13, the 3 rd % the applicant has contended claims 4, 6 and 7 are 
therefore patentable for the same reasons as the alleged patentable claim 1 discussed 
above without discussing their own patentable subject matters and are therefore not 
persuasive. 

On page 13 the last If and page 14 the first 1f, the applicant has simply rewritten 
the amended claim 2 without any discussions of the office action. Likewise on page 14 
the last 1f and page 15 the first 1f, the applicant has rewritten the amended claim 8 
without any discussions of the office action. 

On page 16 the first and second 1f's, the applicant has simply rewritten parts of 
claim 2 in the first 1f for which the examiner had used Ito et al. (US 7,076,433) and in the 
second If has contended the remainder of the claim for which Asano was used to be 
patentable for the same reasons that the applicant had provided for claim 1 . For the first 
part on page 16 the last If, the applicant has simply rewritten parts of Ito's teachings and 
broadly concluded that they fail to teach the parts of claim 2 that the examiner had used 
it for without explaining which specific teaching of Ito failing to teach which specific parts 
of the claim limitations for which it was used and providing any persuasive arguments. 
The applicant should have provided arguments against the examiner's mappings and 
pointed out why those mappings which were very clearly pointed out by enclosing them 
inside parenthesis, fail to teach applicant's claimed invention. 
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For the second part (i.e., the part of claim 1 which Asano was used), the 
applicant has simply remarked they are patentable for the same reasons as the 
allegedly patentable claim 1 and are therefore not persuasive. 

On page 17 the 3 rd If, the applicant has contended claim 8 to be patentable for 
the same reasons as the allegedly patentable claim 1 and is therefore not persuasive 
for the same reasons. 

On page 17, the 4 th 1f, the applicant has contended that claims 3 and 9-12 to be 
patentable by "virtue of their dependency" on their allegedly patentable parent claims 
without discussing anything about their possible specific allowable subject matters and 
are therefore not persuasive. 

On page 17 the last ^jand page 18, regarding claim 5, the applicant has 
contended it to be patentable over the prior art references without again showing which 
specific part of that claim mapping to Okuno et al. (US 7,035,418) has failed to teach 
which specific part of claim 5. The applicant here in the REMARKS has not provided 
any persuasive arguments against examiner's mappings and why those mappings 
(enclosed within parenthesis) and the reasoning including motivations to combine fail 
and therefore all those arguments are moot. 

Applicant's arguments fail to comply with 37 CFR 1 .1 1 1 (b) because they amount 
to a general allegation that the claims define a patentable invention without specifically 
pointing out how the language of the claims patentably distinguishes them from the 
references. 
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The applicant is respectfully directed at page 12 the last If and page 13 of the 
office action where an entire page was devoted to how the examiner had mapped the 
teachings of the said reference to the limitations (of claim 5) for which the said reference 
was used. 

Therefore all the claims stay rejected. The office action follows next. 

Claim Rejections - 35 USC § 103 

1 . The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

2. Claims 1,4,6-7 are rejected under 35 U.S.C. 1 03(a) as obvious over Asano (US 
2004/0054531). 

Regarding claim 1 , Asano does teach an automatic speech recognition system, 
which recognizes speeches in acoustic signals detected by a plurality of microphones 
as character information, the system comprising: 

a sound source localization module configured to localize a sound direction 
corresponding to a specified speaker based on the acoustic signals detected by the 
plurality of microphones (U 01 29 teach the head unit 3 in Fig. 2 (performing similar 
function as the sound source localization module) enables obtaining the direction of 
sound utilizing a plurality of microphones at a target (e.g. a robot) which receives the 
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speech signals by computing power and phase differences of speech signals due to 
sources attributed to users (speakers); Abstract teaches all incoming speech undergo 
speech recognition utilizing plural sets of acoustic models); 

a feature extractor configured to extract features of speech signals contained in 
one or more pieces of information detected by the plurality of microphones (the feature 
extractor unit 101 in Fig. 9 receives speech data from microphones (unit 21 Fig. 9) via 
the analog to digital converter. If 0104 lines 1-4 teach extracting feature vectors of the 
incoming speech data) ; 

an acoustic model memory configured to store distance-dependent acoustic 
models that are adjusted to a plurality of distances at intervals (If 0131 lines 4-8 
referring to Fig. 9 disclose storing acoustic models corresponding to selected distances 
in the database units (1 04) 1 .... (104)_N (located in the memory module 42 in Fig. 3); 
these data are used following determination of distance of a sound source attributed to 
a user; If 0106 teaches all feature vectors corresponding to acoustic analysis (1f0104 
lines 1-2) are stored in speech periods (intervals) corresponding to an utterance); 

an acoustic model composition module configured to compose an acoustic model 
adjusted to the sound direction, which is localized by the sound source localization 
module, based on the distance-dependent acoustic models in the acoustic model 
memory, the acoustic model composition module also configured to store the acoustic 
model in the acoustic model memory (If 01 14 teaches "N" acoustic models 
corresponding to "N" sound sources located at "N" distances are produced (composed) 
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where each acoustic model corresponding to a certain distance (localization) is stored in 
a certain database (e.g. one of the units (104)1 to (104)_N) in Fig. 9) ); 

and a speech recognition module configured to recognize the features extracted 
by the feature extractor as character information using the acoustic model composed by 
the acoustic model composition module (fl 01 32 (lines 2-1 above If 01 33) teaches 
module units 41 B and 41 A in Fig. 3 as disclosed in step S4 in Fig. 10 enable 
performing speech recognition utilizing feature vectors extracted from the speech data 
(If 0132 lines 1-3); 1f 0108 lines 1-4 teach acoustic model databases used for the 
speech recognition have stored acoustic characteristics of phonetic-linguistic-units such 
as phonemes or syllables (e.g. character units comprising words) in them) . 

Asano does not specifically disclose storing direction-dependent acoustic models 
in its acoustic model memory for time intervals corresponding to speech signals. It 
would have therefore been obvious to one with ordinary skill in the art at the time the 
invention was made to not only store distance-dependent acoustic models in the 
memory as is done here in modules 104 in Fig. 9, but also direction-dependent acoustic 
models obtained by using the stored distance-dependent acoustic models in obtaining 
the direction of sound using the methods of If 0129 so that the robot will have bias for 
direction as well as distance and will point at the direction of the source of sound to 
reduce detecting wrong signals and enhance its speech recognition performance. 

Regarding claim 4, Asano does teach a system according to claim 1 , wherein the 
sound source localization module is further configured to employ a scattering theory to 



Application/Control Number: 10/579,235 Page 9 

Art Unit: 2626 

generate a model for an acoustic signal, which scatters on a surface of a member 
0142 lines 4-9 and H 0145 teach using reflected (scattered) ultrasonic wave pulses from 
an obstacle, the robot can determine the distance of a user producing speech from the 
robot which according to If 0131 is used in generating distance dependent acoustic 
models; all these processes are made possible by the sensor unit 1 1 1 (Fig. 1 1 ) 
attached to the robot); 

specifying the sound direction for the speaker with the intensity difference and 
the phase difference detected from the plurality of microphones (U 0129 lines 1-5); 

Asano does not specifically teach using scattered (reflected) waves from the 
surface to which the microphones are attached to determine the sound direction 
attributed to a user (speaker). But it would have been obvious to one with ordinary skill 
in the art at the time the invention was made to utilize the sensor unit 1 1 1 to create a 
second model in which one determines the phase and power difference of the sound 
waves reflected (scattered) from the surface of the robot in determining the direction of 
the speech signals and thereby create a second model of determination of direction of a 
sound wave incident on the robot by analyzing reflected rather than incident waves on 
the robot and thereby help in generating more accurate sound direction by 
benchmarking the results of the two models against one another. 

Regarding claim 6, Asano does not specifically disclose a system according to 
claim 1, wherein the acoustic model composition module is configured to compose an 
acoustic model for the sound direction by applying weighted linear summation to the 



Application/Control Number: 10/579,235 Page 10 

Art Unit: 2626 

direction- dependent acoustic models in the acoustic model memory, and weights 
introduced into the linear summation are determined by training. 

Asano however does teach using N stored distance dependent acoustic models 
(104 units in Fig. 9) to select one acoustic model among them which has the closest 
match for a sound source which is located within the range of the distances 
corresponding to the said acoustic models by using its matching unit 103 which does 
the matching by calculating acoustic scores flf 0122). ^ 0121 lines 1-4 teaches distance 
calculator (unit 47 in Fig. 3) enabling calculation of distance of the robot from a user 
uttering speech (speaker). 

It would have therefore been obvious to one with ordinary skill in the art at the 
time the invention was made to compose an acoustic model corresponding to a sound 
source using a linear interpolation of acoustic models which correspond to the distances 
closest to the said sound source and train the acoustic model by changing the 
coefficients of the linear interpolation to achieve the best score, and thereby compose a 
more realistic acoustic model for the said sound source resulting in better performance 
by the robot. 

For extension of this discussion from distance dependent to direction dependent 
acoustic models please see the obviousness of the claim 1 . 

Regarding claim 7, Asano does teach a system according to claim 1 , further 
comprising a speaker identification module, wherein the acoustic model memory is 
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further configured to possess the direction-dependent acoustic models for respective 
speakers (identical to claim 1, 3 rd limitation and rejected under similar rationale), 

and wherein the acoustic model composition module is further configured to: 

refer to direction-dependent acoustic models of a speaker who is identified by the 
speaker identifying module and to a sound direction localized by the sound source 
localization module ( If 01 16 teaches the acoustic model database to have the acoustic 
model of speakers (specific speakers) at specified locations; U 0159 teaches those 
acoustic models are produced by from the speech data acquired by microphone placed 
close to the mouth of the speakers for better voice recognition leading to better speaker 
identification; 1f0130 lines 5-1 above ^[01 31 teach image of the face of a user (e.g. a 
potential speaker) taken by the robot's CCD cameras (modules 22L and 22R in Fig. 2) 
are used as a reference pattern (stored) for image recognition (user identification); 
finally H 0145 lines 1-5 teach the robot uses both detection of a user uttering (identifying 
a user by his voice) as well as image recognition (identifying a user by his image) in 
orienting his head unit in the direction of the user; i.e., speaker identification is done by 
both image pattern recognition as well as by voice recognition using the stored acoustic 
models of the user (speaker)); 

compose an acoustic model for the sound direction based on the direction- 
dependent acoustic models in the acoustic model memory; and storing the acoustic 
model in the acoustic model memory (corresponds to the third limitation of the first claim 
and rejected under similar rationale). 

For obviousness analysis please see claim 1 . 
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3. Claim 2-3, 8-12 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Asano, and further in view of Ito et al. (US Patent 7,076,433). 

Regarding claim 2, Asano does teach an automatic speech recognition system, 
which recognizes speeches of a specified speaker in acoustic signals detected by a 
plurality of microphones as character information, the system comprising: 

a sound source localization module configured to localize a sound direction 
corresponding to the specified speaker based on the acoustic signals detected by the 
plurality of microphones (identical to the first limitation of claim 1 and rejected under 
similar rationale); 

an acoustic model memory configured to store distance-dependent acoustic 
models that are adjusted to a plurality of directions at intervals (identical to the 3 rd 
limitation of claim 1 and rejected under similar rationale) ; 

an acoustic model composition module configured to compose an acoustic model 
adjusted to the sound direction, which is localized by the sound source localization 
module, based on the distance-dependent acoustic models in the acoustic model 
memory, the acoustic model composition module storing the acoustic model in the 
acoustic model memory (identical to the 4 th limitation of claim 1 and rejected under 
similar rationale) ; 

and a speech recognition module configured to recognize the features extracted 
by the feature extractor as character information using the acoustic model composed by 
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the acoustic model composition module (identical to the 5 th limitation of claim 1 and 
rejected under similar rationale) 

Asano does not specifically disclose a sound source separation module which 
separates speech signals of the specified speaker from the acoustic signals based on 
the sound direction localized by the sound source localization module 

a feature extractor configured to extract features of the speech signals separated 
by the sound source separation module; 

an acoustic model memory configured to store direction-dependent acoustic 
models that are adjusted to a plurality of directions at intervals corresponding to speech 
signals. 

Ito et al. does teach a sound source separation apparatus which in one 
embodiment separates sound (e.g. attributed to a speaker) from a mixed input signal 
(acoustic signal) by utilizing sound source direction as an acoustic feature (Abstract 
lines 1-2 first ]f and line 4 from the bottom; Col. 18 lines 61 to 66 referring to Fig. 16 unit 
91 1). Furthermore it teaches its acoustic feature extractor to incorporate a sound source 
direction prediction layer which aids in determination of the peaks corresponding to 
acoustic features of the sound source incident from a certain direction (Col. 19 lines 33- 
40 and module 921 in Fig. 16). 

It would have therefore been obvious to one with ordinary skill in the art at the 
time the invention was made that utilizing the modules 915 and 921 in Fig. 16 of Ito et 
al. into the feature extractor unit 101 of Fig. 9 of Asano would enable the latter to 
separate a sound (e.g a speech signal) from a mixed input when the sound is incident 
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from a certain direction by utilizing its direction dependent features enabling the robot to 
obtain bias for direction for a certain speaker and will point at the direction of the source 
of sound to reduce detecting wrong signals and enhance its speech recognition 
performance. Storage of direction dependent features also aids in avoiding calculations 
of the said features and helps in raising efficiency. 

Regarding claim 3, Asano does suggest a system according to claim 1 , wherein 
the sound source localization module is further configured to: 

acquire an intensity difference and a phase difference for the harmonic 
relationships extracted through the plurality of microphones flj 0129 lines 1-5 teach 
acquiring power (intensity) and phase difference of speech signals (which maintain 
harmonic spectra) picked up by a device (e.g. robot) microphones); 

acquire belief factors for a sound direction based on the intensity difference and 
the phase difference, respectively; and 

determine a most probable sound direction (utilizing power and phase difference 
in determining the direction of sound as disclosed in Tf 0129 lines 1-5 does inherently 
involve these or equivalent steps to achieve the same net outcome (sound direction 
determination)). 

Asano does not specifically disclose: 

To perform a frequency analysis for the acoustic signals detected by the 
microphones to extract harmonic relationships. 
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Ito et al. does disclose performing a frequency analysis for the acoustic signals 
detected by the microphones to extract harmonic relationships (Col. 12 lines 45-50 
teach harmonic calculation layer (named as the intermediate feature extraction layer 
unit 107 in Fig. 9) determines harmonic features of features (attributed to acoustic 
analysis of speech signals) at each time based on their frequency variation rates (i.e. by 
doing frequency analysis)); 

It would have therefore been obvious to one with ordinary skill in the art at the 
time the invention was made that utilizing the harmonic calculation layer (unit 107 in Fig. 
9) into the feature extractor unit 101 of Fig. 9 of Asano would enable Asano to extract 
harmonic structures attributed to speech signals incident on the microphones of the 
robot prior to obtaining their phase and power differences for the sake of determining 
the direction of the speech and thereby eliminate redundant parts of the spectra in 
determining direction of speech which primarily included harmonics and thereby 
enhance efficiency and accuracy. 

Regarding claim 8, the preamble, the 1 st , 4 th , 5 th , 6 th and 7 th limitations 
correspond to the preamble, 1 st , 2 nd , 3 rd , 4 th and 5 th limitations respectively of the claim 
1 and are therefore rejected under similar rationale over Asano. 

Limitation 3, corresponds to the limitation 2 of the claim 2 and is therefore 
rejected under similar rationale by Ito et al. 

Regarding the 2 nd limitation, storing direction dependent acoustic models at time 
intervals (the 3 rd limitation of claim 1) will amount to storing the sound direction at time 
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intervals corresponding to the speech attributed to a sound source and the storage 
medium used (i.e. the memory unit 42 in Fig. 3) will enable the unit the functionality of 
the stream tracking module claimed in this limitation which will enable estimating the 
current position of a sound source by simple interpolation. 
For obviousness please see claims 1 and 2. 

Regarding claims 9, 10, 11, 12, they correspond to claims 3, 4, 6, 7 respectively 
with identical limitations and are therefore rejected under similar rationales. 

4. Claim 5 rejected under 35 U.S.C. 103(a) as being unpatentable over Asano in 
view of Ito et al., and further in view of Okuno et al. (US Patent 7,035,41 8). 

Regarding claim 5, Asano in view of Ito et al. does not specifically disclose a 
system according to claim 2, wherein the sound source separation module is further 
configured to employ an active direction-pass filter so as to separate speeches, the filter 
is configured to: 

separate speeches by a narrower directional band when a sound direction, which 
is localized by the sound source localization module, lies close to a front, which is 
defined by an arrangement of the plurality of microphones; 

and separate speeches by a wider directional band when the sound direction lies 
apart from the front. 

Okuno et al. does teach a directional filter for sound based on the position of the 
source, enabling localization and extracting sound (speech) information of the said 
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sound source which will thereby enable it to separate that source from other sound 
(speech) sources (Col. 4 lines 25-34); Col. 8 lines 1-7 referring to the flow chart on Fig. 
8 identifies steps ST5 and ST6 as the steps associated with the directional filter's 
operation; Col. 8 lines 15-20 note the results of directional filter on detecting directions 
of sound from three sources (A, B and C on Fig. 7) and notes that the angular range 
(directional band) about which these directions are determined to have been reduced 
due to the application of the filter; Col. 11 lines 48-52 teach the directional filter 
functions according to the direction and position of the sound source which is 
determined first by computing the difference between phase and intensity of sound 
received at two receivers from the same source which is directly related to their path 
differences; Col. 6 lines 32-35 referring to Fig. 4 explicitly show a relationship (d=D 
sin(theta)) between the two "sound" signal path differences (attributed to the source 
position which is called "d"), the distance between the receiving microphones (called 
"D") and the angle governing the direction of the sound source with respect to an axis at 
right angles to the line connecting the two microphones (called "theta") which shows the 
angle and thereby the angular range (directional band) to increase with that path 
difference. 

It would have therefore been obvious to one with ordinary skill in the art at the 
time the invention was made that utilizing these methods (i.e., steps ST5 and ST6 in 
Fig. 8 of Okuno et al. ) for directional filter into the flow chart of Fig. 10 of Asano (to 
after step S2) would enable the latter to apply the distance dependent directional filter of 
Okuno et al. which would result in less accuracy and larger angular range (wider 
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directional band) for further sound sources since the incoming signals possess longer 
path differences as larger path difference (i.e. larger "d" in the relationship above) lead 
to larger angular range (i.e. larger deviations (directional band) in "theta"). Application 
of directional filter will reduce the error in determining the direction of the sound source 
in general. 



Conclusion 

The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. Maekawa et al. (US Patent 6,471 ,420), Almstrand et al. (US 
2003/0229495). 

5. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1 .136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 
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Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to FARZAD KAZ EM INEZ HAD whose telephone number is 
(571)270-5860. The examiner can normally be reached on M-F 8:30AM-5:00 PM EST. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Talivaldis I. Smits can be reached on (571)272-7628. The fax phone 
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